Steady-state parameter sensitivity in stochastic modeling via trajectory reweighting 
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Parameter sensitivity analysis is a powerful tool in the building and analysis of biochemical net- 
work models. For stochastic simulations, parameter sensitivity analysis can be computationally 
expensive, requiring multiple simulations for perturbed values of the parameters. H[ere, we use 
trajectory reweighting to derive a method for computing sensitivity coefficients in stochastic simula- 
tions without explicitly perturbing the parameter values, avoiding the need for repeated simulations. 
The method allows the simultaneous computation of multiple sensitivity coefficients. Our approach 
recovers results originally obtained by application of the Girsanov measure transform in the general 
theory of stochastic processes [A. Plyasunov and A. P. Arkin, J. Comp. Phys. 221, 724 (2007)]. 
We build on these results to show how the method can be used to compute steady-state sensitiv- 
ity coefficients from a single simulation run, and we present various efficiency improvements. For 
models of biochemical signaling networks the method has a particularly simple implementation. We 
demonstrate its application to a signaling network showing stochastic focussing and to a bistable 
genetic switch, and present exact results for models with linear propensity functions. 

PACS numbers: 87.18.Tt, 87.10.Mn, 02.50.Ga 



I. INTRODUCTION 

Parameter sensitivity analysis is one of the most impor- 
tant tools available for modelling biochemical networks. 
Such analysis is particularly crucial in systems biology, 
where models may have hundreds of parameters whose 
values are uncertain. Sensitivity analysis allows one to 
rank parameters in order of their influence on network 
behaviour, and hence to target experimental measure- 
ments towards biologically relevant parameters and to 
identify possible drug targets. For deterministic models, 
the adjunct ODE method provides an efficient way to 
compute the local sensitivity of a model to small changes 
in parameters. For stochastic models, however, parame- 
ter sensitivity analysis can be computationally intensive, 
requiring repeated simulations for perturbed values of 
the parameters. Here, we demonstrate a method, based 
on trajectory reweighting, for computing local parameter 
sensitivity coefficients in stochastic kinetic Monte-Carlo 
simulations without the need for repeated simulations. 

Sensitivity analysis of biochemical network models 
may take a number of forms. One may wish to deter- 
mine how a model's behaviour changes as a parameter 
is varied systematically within some range (a parameter 
sweep) , its dependence on the initial conditions of a simu- 
lation, or its sensitivity to changes in the structure of the 
model itself (alternate mode-of- action hypotheses). In 
this paper, we focus on the computation of local param- 
eter sensitivity coefficients. These coefficients describe 
how a particular output / of the model varies when the 
a-th parameter of the model, , is varied by an infinites- 
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where /' is the output of the model computed in a sys- 
tem with ka changed to ka + h. For deterministic mod- 
els, where the dynamics of the variables Xi{t) can be 
described by a set of deterministic ordinary differential 
equations (ODEs) dxi/dt — g{xi,ka), differentiation of 
the ODEs with respect to ka shows that the sensitivity 
coefficients Ci^a = dxi/dka obey an adjunct set of ODEs, 
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These adjunct ODEs can be integrated alongside the 
original ODEs to compute the sensitivity coefficients "on 
the fly" in a deterministic simulation of a biochemical 
network. 

Stochastic models of biochemical networks are (gen- 
erally) continuous-time Markov processes [T] which are 
solved numerically by kinetic Monte-Carlo simulation, 
using standard methods such as the Gillespie .2] or 
Gibson-Bruck [3] algorithms. Replicate simulations will 
produce different trajectories; we wish to compute how 
the average value (/) of some function fit) of the model 
changes with the parameter ka'- 
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where the averages are taken across replicate simula- 
tion runs. If one is interested in steady-state {i. e. 
time- independent) parameter sensitivities, the averages 
m Eq. (§ may instead be time averages taken over a sin- 
gle simulation run. Naive evaluation of parameter sensi- 
tivities via Eq. ([3| is very inefficient, since one is likely 
to be looking for a small difference between two fluctu- 
ating quantities. There are several existing approaches 
that get around this problem: spectral methods [51, a 
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method based on the Girsanov measure transform [5], 
and methods which re-use the random number streams 
[5]. In this paper, we develop a method based on tra- 
jectory reweighting, which is simple to implement in ex- 
isting kinetic Monte Carlo codes and provides a way to 
compute steady-state parameter sensitivity coefficients 
"on-the-fly" in stochastic simulations of biochemical net- 
works. The method provides an accessible alternative to 
the Girsanov measure transform pioneered by Plyasunov 
and Arkin [5]. Indeed several of our equations in Sec- 
tion |TT] are equivalent to those Ref. [3 However, we go 
beyond previous work by showing in practical terms how 
the method can be implemented in standard stochastic 
simulation algorithms, extending the method to the com- 
putation of parameter sensitivities in the steady state, 
and showing how time-step preaveraging can be used to 
improve the efficiency of the calculations. 



II. TRAJECTORY REWEIGHTING 

The basic idea behind trajectory reweighting is as fol- 
lows. In a kinetic Monte-Carlo simulation, for a given 
set of parameters any given trajectory has a statistical 
weight which measures the probability that it will be 
generated by the algorithm this weight can be ex- 

pressed as an analytical function of the states of the sys- 
tem along the trajectory and of the parameter set. This 
analytical function also allows us to compute the statisti- 
cal weight for this same trajectory, in a system with a dij- 
ferent set of parameters: i. e. its weight in the ensemble 
of trajectories with perturbed parameters. This allows 
us in principle to compute the average (/)' in Eq. (§ for 
the perturbed parameter set, using only a set of trajecto- 
ries generated with the unperturbed parameter set. For 
most applications this is inefficient, because the weight of 
a trajectory in the perturbed ensemble is typically very 
low, resulting in poor sampling. However, it turns out 
that trajectory reweighting does provide an effective way 
to compute local parameter sensitivity coefficients. 



Trajectory reweighting for kinetic Monte-Carlo 
simulations 

More specifically, let us consider a typical implementa- 
tion of the Gillespie algorithm (similar arguments ap- 
ply to more recent algorithms, such as Gibson-Bruck [3]). 
Here, the state of the system is characterised by a set of 
discrete quantities Ni, typically representing the number 
of molecules of chemical species i. Transitions between 
states are governed by propensity functions af^{Ni,ka) 
where /i labels the possible reaction channels and the 
quantities ka are parameters in the problem, typically 
reaction rates (fca represents the a-th such parameter). 
A kinetic Monte-Carlo trajectory is generated by step- 
ping through the space of states in the following way. 
We first compute the propensity functions a^{Ni) for all 



the possible transitions out of the current state. We then 
choose a time step (i e. waiting time) 6t from an expo- 
nential distribution p{St) = r^^e^"'*/'^, where the state- 
dependent mean timestep (the mean waiting time be- 
fore exiting the current state) is {St) ~ t = (X]/j"m)~"^- 
We choose a reaction channel with probability — 
a^/^^a^. We advance time by 6t and update the val- 
ues of Ni according to the chosen reaction channel. We 
are now in a new state, and the above steps are repeated. 

Now let us consider the statistical weight of a given tra- 
jectory generated by this algorithm 7-9 . In each step, 
the probability of choosing the value of 6t that we actu- 
ally chose is proportional to r~^e~*'/'^, and the proba- 
bility of choosing the reaction channel that we actually 
chose is equal to a^/X^^i'^M- '^^^ therefore associate 
a weight P with the whole trajectory, which is propor- 
tional to the probability of generating the sequence of 
steps which we actually generated: 

P =n.tepJ«M/E,a.)x(r-ie-^*/-) 

The second line follows by eliminating (I/X^^j'^/j) ^ 
r^^ = 1 (note that because Eq. Q is not normalized, 
P is a weight rather than a true probability). 

In a typical kinetic Monte-Carlo simulation, we gen- 
erate multiple independent trajectories of length for a 
given parameter set. The probability of generating any 
given trajectory in this sample will be proportional to 
its weight P, defined in Eq. Q. We then compute the 
average {f{t)) of some function f{t) of the state of the 
system by summing over the values of /, at time t, for 
these trajectories. 

Having generated this set of trajectories, let us now 
suppose we wish to re-use them to compute the aver- 
age {f{t))' which we would have obtained had we re- 
peated our simulations for some other parameter set. It 
turns out that we can compute this average by summing 
over the same set of trajectories, multiplied by the ratio 
of their statistical weights for the perturbed and unper- 
turbed parameter sets. To see this, we first recall that an 
average, e. g. (/(i)), can be written as a sum over all pos- 
sible trajectories j of length t, multiplied by their statisti- 
cal weights Pf {fit)} = {j:,PjfAt))/iJ:,Pj)- Writing 
the perturbed average {f{t))' in this way, we obtain 

^J^^>' - E,-p; ~ i:,{Pi/p,)p, ~ (p'/p) 

where P and P' are the trajectory weights (calculated 
using Eq. Q) for the original and perturbed models 
respectively. In another context, Eq.([5]) has been used 
to reweight trajectory statistics in order to sample rare 
events in biochemical networks [9] ; it also forms the basis 
of umbrella sampling methods for particle-based Monte 
Carlo simulations [TU] . 

Whilst in principle Eq.([5]) provides a completely gen- 
eral way to transform between trajectory ensembles with 
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different parameter sets, in practice it is useless for any 
significant deviation of the parameter set from the origi- 
nal values, for two reasons. First, the statistical errors in 
the computation of (/)' grow catastrophically with the 
size of the perturbation, because the original trajectories 
become increasingly unrepresentative of the perturbed 
model. Second, the computational cost of determining 
the trajectory weights for the perturbed and unperturbed 
parameter sets via Eq. Q is only marginally less than the 
cost of computing (/)' directly by generating a new set 
of trajectories for the perturbed parameter set. 



Computation of parameter sensitivity coefficients 

It turns out, however, that Eq. ^ is useful for the com- 
putation of parameter sensitivity coefficients, where the 
deviation between the original and perturbed parameter 
sets is infinitesimal. Let us suppose that the perturbed 
problem corresponds to a small change in a single param- 
eter, such as k'^ = ka + h; the corresponding sensitivity 
coefficient is defined by Eq. As we show in Supple- 
mentary Material Section [l] differentiating Eq. ([5| with 
respect to k'^ leads to the following expression for the 
sensitivity coefficient: 



dka 



= {fWkJ-{f){WkJ. 
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Supplementary Material Section[l]also shows how to gen- 
eralize this approach to higher-order derivatives. Com- 
bining Eq. Q with Eq. ([t]) shows that the "weight func- 
tion" Wk^ can be expressed as a sum over all steps in the 
trajectory: 



steps 



5Wu 



where 



aina^ d{Y,a^,)^^ 



dka 



dka 



(8) 



(9) 



Eqs. ([6|~([9]) are the key results of this paper, since 
they point to a practical way to compute parameter sen- 
sitivity coefficients in kinetic Monte-Carlo simulations. 
To evaluate the (time-dependent) parameter sensitivity 
d{f{t)) /dka, one tracks a weight function Wk^{t), which 
evolves according to Eqs. Q and ([9|. One also tracks 
the function f(t) of interest. The covariance between 
Wk^ and /, at the time t of interest, computed over mul- 
tiple simulations, then gives the sensitivity of (/(i)) to 
the parameter in question (as in Eq. ([6])). Tracking Wk^ 
should be a straightforward addition to standard kinetic 
Monte-Carlo schemes. Moreover we note that / could 



be any function of the variables of the system — for ex- 
ample, if one were interested in the parameter sensitiv- 
ity of the noise in particle number Ni, one could choose 
f{Ni) — Nf . More complex functions of the particle 
numbers, involving multiple chemical species, could also 
be used (see examples below). 

This prescription for computing parameter sensitivities 
presents, however, some difficulties in terms of statisti- 
cal sampling. The two terms in Eq. ^ are statistically 
independent quantities with the same expectation value, 
d\n{'^a^j)/dka- Hence they cancel on average but the 
variances add. Thus we expect that Wk^ is a stochastic 
process with a zero mean, (WkJ) = 0; ^-nd & variance 
that should grow approximately linearly with time — as 
shown for a simple example case in Supplementary Ma- 
terial Section[2] — in effect Wk^ behaves as a random walk 
(i. e. a Wiener process). In terms of controlling the sam- 
pling error, this means that the number of trajectories 
over which the covariance is evaluated should increase in 
proportion to the trajectory length, since the standard 
error in the mean is expected to go as the square root of 
the variance divided by the number of trajectories 
In Section |III[ we discuss a way to avoid this problem, 
when computing steady-state parameter sensitivities. 



Simplifications and practical implementation 

Without loss of generality we can presume that the 
parameter ka will appear in only one of the propensity 
functions, which we call Oq- |12j . With this presumption, 
Eq. ([9| becomes 
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Eq. ( 10 ) makes a direct link with the Girsanov measure 



transform method introduced by Plyasunov and Arkin, 
being essentially the same as Eq. (31b) in Ref. El 

A further simplification occurs if ka is the rate coef- 
ficient of the a-th reaction, so that Oq, is linearly pro- 
portional to ka- One then has dlnua/dka = l/ka and 
Eq. ([8| becomes 



1 

ka 



- Estops ^aSt 



(11) 



where Qa counts the number of times that the a-th re- 
action is visited. This is essentially the same as Eq. (9b) 
of Plyasunov and Arkin's work, Ref. \5\ 

Eq. (11) suggests a very simple way to implement 



parameter sensitivity computations in existing kinetic 
Monte-Carlo codes. One simply modifies the chemical 
reaction scheme such that each reaction whose rate con- 
stant is of interest generates a "ghost" particle in addi- 
tion to its other reaction products (this is similar to the 
clock trick in Ref. [T3|) . There should be a different flavour 
of ghost particle for each reaction of interest, and ghost 
particles should not participate in any other reactions. 
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Qa (t) is then simply given by the number of ghost parti- 
cles associated with the a-th reaction which are present 
at time t. In Section|IVj we use this approach to compute 
sensitivity coefficients using the unmodified COPASI |T3] 
simulation package. In Supplementary Material Section 
[2] we also exploit this trick to obtain some exact results 
for linear propensity functions. 



III. STEADY STATE 

So far, we have discussed the computation of time- 
dependent parameter sensitivity coefficients d{f{t))/dka, 
by evaluating the covariance of the weight function 
Wk^{t) with the function f(t) over multiple simulation 
runs. Often, however, one is interested in the parame- 
ter sensitivity of the steady-state properties of the system 
d{f)ss/dka; this is a time-independent quantity. We now 
discuss the computation of steady-state parameter sen- 
sitivities using trajectory reweighting. We show that in 
this case, first, the problem of poor sampling of Wk^{t) 
for long times can be circumvented, second, one can ob- 
tain sensitivity coefficients from a single simulation run, 
and third, one can improve efficiency by a procedure 
which we call time-step pre-averaging. 



The ensemble-averaged correlation function method 

To compute steady-state parameter sensitivities, one 
might imagine that we could simply apply the method 
discussed in Section |llj taking the limit of long times, 
when the system should have relaxed to its steady state. 
However, this does not work, because the variance be- 
tween trajectories of the weight function Wk^ increases 
in time, making it impossible to obtain good statistics at 
long times. To circumvent this problem, we note that the 
right hand side of Eq. ^ is unaltered if Wk^ is offset by 
a constant. Thus we may write the parameter sensitivity 
in the form of a two-point time-correlation function: 
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and to is some arbitrary reference time such that t — t^ = 
Ai > 0. This relation has the advantage that we may 
choose Ai sufficiently small to make the variance of 
Al/Ffc^(i,to) manageable. Importantly, in steady-state 
conditions, we expect that the correlation function de- 
pends only on the time difference and not separately on 
the two times, so that C(t, io) = C'(Ai), with C(0) = 
and C(At) — >■ d{f)ss/dka as At — > 00. Thus to cal- 
culate the sensitivity coefficient under steady state con- 
ditions, all we need to do is compute the steady-state 



correlation function defined in Eq. (12), choosing a suit- 
able "reference" time to when the system is already in 
the steady-state, then take the asymptotic (large At) 
value of this correlation function. We expect the cor- 
relation function to approach its asymptotic value on a 
timescale governed by the (likely short) relaxation time 
spectrum in the steady state, so that for most problems 
large values of At should not be required. Noting that 
in this method, as in Section |ll| the averages in Eq. (12) 
are computed over multiple independent simulation runs, 
we term this approach the ensemble-averaged correlation 
function method. 

From a practical point of view, this method involves 
the following set of steps or 'recipe': 

1. Choose two time points ti and ^2 such that the 
system has already reached its steady state at time 
ti and ^2=^1 + At where At is greater than the 
typical relaxation time of the quantity / of interest 
(typically this is the same as the longest relaxation 
time in the system as a whole). 

2. Compute Wk^ at times ti and t2 and / at time ^2- 

3. Calculate the difference AWk^ = Wk^{t2) — 
Wk^{ti). Compute also the product f{t2)AWk^- 

4. Repeat steps 1-3 for many independent simulation 
runs and compute the averages (/(t2)), (AVFfc„ (^2 — 
ti)) and {f{t2)AWk^{t2 — ti)) over the replicate 
simulations. 

5. Calculate the correlation function C{At) — 
{fAWkJ - {f)(AWkJ. As long as At is large 
enough this provides a measurement of d{f)ss/dka. 



The time-averaged correlation function method 

It turns out, however, that for steady-state parameter 
sensitivities, we do not need to average over multiple sim- 
ulation runs — we can instead compute time-averages over 
a single simulation run. This amounts to replacing the 
steady-state ensemble averaged sensitivity d{f)ss/dka by 
the time averaged version df^^/dka, where 



/ 
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(recalling that in kinetic Monte Carlo, the timestep St 
varies between steps). In principle, df^^/dka could 
be obtained by computing the time-averaged version of 
Eq. pi): 



,iAt)^ fit)AWkAt,to)-fit) AWkAt,to) (15) 



and taking the limit of large At = t—to- Eq. ( 15 ) requires 



one to keep track of Wk^ a precise time At in the past; 
since the time step is not constant in kinetic Monte Carlo, 
this is rather inconvenient to implement. 
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Fortunately, however, tracking the weight function at 
a precise time in the past turns out to be unnecessary. As 
At becomes large, the stochastic differences between in- 
dividual time steps cancel out and it becomes equivalent 
simply to compute the average 



Ctimc av(n) = fit) AWk^ (n) - fit) AWk^ in) (16) 
where 



AWk^ in) = Wk^ it) - Wk^ in steps ago). 



(17) 



and to use the fact that Ctimc av('^) df^^/dka as 
n — CXI. One can quite easily keep track of AVF^^ (n), for 
instance by maintaining a circular history array storing 
over the last n steps. This approach, which we de- 
note the time-averaged correlation function method, has 
the important advantage that one can obtain the steady 
state parameter sensitivity from a single simulation run. 

The recipe for using the time-averaged correlation 
function method is then: 

1. Choose a time interval At which is greater than the 
typical relaxation time of the quantity / of interest. 
Estimate the typical number of steps n taken in 
time At: n = At/f. 

2. For a simulation of the system in the steady state, 
record / and Wk^ every n steps (we denote each of 
these recordings a 'timeslice'). 

3. For each timeslice i, compute the difference be- 
tween Wf. and its value in the previous timeslice: 



W^'^^K Compute also /WAVF^ 



4. Compute the averages over all timeslices of /, 
AWk^ and fAWk^. 

5. Calculate the correlation function C(n) = 
fAWk^ — if)iAWk^)- As long as n is large enough 
this provides a measurement of df^^/dka- 



for a given state of the system. It can be proved formally 
that if we run our simulations for a sufficiently long time. 



Eq. ( 14 ) is equivalent to 
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Intuitively, this relation arises because a sufficiently long 
trajectory, under steady state conditions, will visit each 
state an arbitrarily large number of times and thus thor- 
oughly sample the distribution of waiting times in each 
state. 

One cannot, however, compute the parameter sensitiv- 
ity dL^/dka simply by evaluating the time averages in 
Eq. (15) or (16) using the new definition, Eq. (18). This 



is because r itself depends on the parameter ka- Instead, 
a slightly more complicated expression for df^^/dka is 
required; this is given in Supplementary Material Sec- 
tion [3] Thus, time-step pre-averaging provides a more 
efficient way to compute the steady-state parameter sen- 
sitivity (since the time does not need to be updated in the 
Monte Carlo algorithm), at the cost of a slight increase 
in mathematical complexity. 



IV. EXAMPLES 

We now apply the methods described above to three 
case studies: a model for constitutive gene expression 
for which we can compare our results to analytical the- 
ory, a simple model for a signaling pathway with stochas- 
tic focussing, and a model for a bistable genetic switch. 
The second and third examples are chosen because they 
exhibit the kind of non-trivial behaviour found in real 
biochemical networks, yet the state space is sufficiently 
compact that the parameter sensitivities can be checked 
using finite-state projection (FSP) methods [15 . Our im- 
plementation of the FSP methods is described more fully 
in Supplementary Material Section |4] 



Time-step pre-averaging 

The time-averaged correlation function method is a 
convenient way to compute parameter sensitivities in 
a standard kinetic Monte Carlo scheme, in which both 
a new timestep and a new reaction channel are cho- 
sen stochastically at every step. However, choosing a 
new time step at every iteration is computationally ex- 
pensive since it requires a random number, and is not 
strictly necessary for the computation of steady-state pa- 
rameter sensitivity coefficients. Improved efficiency can 
be achieved by choosing only the new reaction channel 
stochastically at each iteration, and replacing St by the 
mean timestep r = (X^/x^m)"^ corresponding to the cur- 
rent state (note that this is state dependent since it de- 
pends on the propensity fimctions). This amounts to 
pre-averaging over the distribution of possible time steps 



A. Constitutive gene expression 

We first consider a simple stochastic model for the ex- 
pression of a constitutive (unregulated) gene, represented 
by the following chemical reactions: 

0AmA0, mAm + N, nA0. (19) 

Here, M represents messenger RNA (synthesis rate fc, 
degradation rate 7) and N represents protein (synthe- 
sis rate p, degradation rate fi). This model has lin- 
ear propensities (as defined in Supplementary Material 
Section [2]), which implies that the mean copy numbers 
(M(i)) and (Nit)) of mRNA and protein respectively 
obey the chemical rate equations 

^ = fc-A(M), '-^ ^ piM) - ,iN) (20) 
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FIG. 1: (color online) Time correlation functions C(At) for 
the sensitivity of the average protein number {N)bb (top) and 
the average mRNA number {M}ss to the model parameters, 
for the constitutive gene expression model in Section |1V A[ 
Points with error bars are simulations using the ensemble- 
average correlation function method; error bars are esti- 
mated by block-averaging (100 blocks of 10'' trajectories; to- 
tal « 10^" time steps). Open circles are simulations using the 
time-average correlation function method with time step pre- 
averaging (plotted as a function of nr); results are averages 
over 10 trajectories each of length 10^ steps (in this case the 
error bars are smaller than symbols). Solid Unes are theoreti- 



cal predictions from Eq. (221. Parameters are A; = 2.76 min~ , 
A = 0.12min~^, p = 3.2min~^, fi — O.OlGmin"^, correspond- 
ing to the cro gene in a recent model of phage lambda [16] . 
For these parameters (M)aa = 23 and (A'^)ss = 4600. 



from which follow the steady state mean copy numbers: 



(21) 



and 
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(22) 

where for notational convenience we consider the sen- 
sitivity with respect to the logarithm of the parameter 
value (e. g. Wink = kWk). 

Figure [T] shows the time correlation functions of 
Eqs. (fT2|) and (13), computed over multiple stochastic 



simulation runs using the ensemble-averaged correlation 
function method, together with the analytical results of 
Eq. ( [22| (solid lines). The agreement between the ana- 
lytic theory and simulation results is excellent. The time 
correlation functions converge to the expected steady 
state sensitivity coefficients (horizontal lines in Fig. II]) . 
For the protein correlation functions {d\n{N)ss/d\nka), 
this occurs on a timescale governed by the relaxation rate 
of protein number fluctuations l//i ~ 60 min, while the 
mRNA correlation functions {d ln{M)ss/d In ka) reach 
their asymptotic values on a timescale governed by the 
mRNA decay rate 1/A ~ 8 min. 

Figure [l] (open circles) also shows the same correla- 
tion functions, computed instead from a single stochastic 
simulation run, using the time-averaged correlation func- 
tion method, with time-step pre-averaging. Although 
this method gives correlation functions (Eqs. (16) and 



(17)) in terms of the number of steps n in the history 
array, rather than the time difference At, these can be 
converted to time correlation functions by multiplying n 
by the expected global mean time step r (the average 
over states of the state-dependent mean time step r). 

Comparing the results of the ensemble-averaged and 
time-averaged correlation function methods in Fig. [T] we 
see that the two methods give essentially the same re- 
sults, but the time-averaged method produces greater 
accuracy (smaller error bars), for the same total num- 
ber of simulation steps. Moreover, because we have used 
time-step pre-averaging with the time-averaged correla- 
tion function method, each simulation step is computed 
approximately twice as fast as in the original kinetic 
Monte Carlo algorithm, since one does not need to gen- 
erate random numbers for the time steps |17j . 



B. Stochastic focusing 



For this problem, steady-state sensitivity coefficients 
can be computed analytically by taking derivatives of 
Eqs. (21) with respect to the parameters of interest. 



Moreover, as shown in Supplementary Material Section 
[2] explicit expressions can also be found for the compo- 
nents of the correlation functions defined by Eqs. ( 12 ) 



We now turn to a more sophisticated case study, based 
on the stochastic focusing model of Paulsson et al. [TB] ■ 
In this biochemical network, a input signal molecule S 
downregulates the production of an output signal (or re- 
sponse) molecule R. Stochastic fluctuations play a crucial 
role, making the output much more sensitive to changes 
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in the input than would be predicted by a deterministic 
(mean-field) model. 

Our reaction scheme, given in Eq. ( |23[ ), contains just 
two chemical species, S and R. The production and degra- 
dation of S (the input signal) are straightforward Poisson 
processes with rates ks and kd respectively. The produc- 
tion of R (the output signal) is negatively regulated by 
S, and its degradation rate is set to unity to fix the time 
scale. Thus we have 



k* 



R 



(23) 



where we use a Michaelis-Menten-like form to represent 
the negative regulation: k* = kp/{S + K), with S being 
the copy number of the input signal molecule. Taking a 
mean-field approach, we might suppose that the average 
copy numbers (S) and (R) should obey the chemical rate 
equations 



d{S) 
dt 



d{R) 
dt 



{S)+K 



(R). (24) 



and that therefore the steady state copy numbers should 
be given by 



ks 



{S)s 



K 



mean input signal (S*) 
mean output signal (i?) 



In reality, while Eq. ( 25 1 gives the correct result for the 
it is manifestly incorrect for the 
jg. For example for 



500, kd = 100, kr = 900, K = 0.09 (26) 



we find from kinetic Monte Carlo simulations {S)ss = 5, 
as predicted by Eq. (25), but {R)ss ~ 290, whereas 
Eq. ([25| predicts kr/{{S)^s + K) = 176.82. This failure 
of the mean-field prediction arises because of the non- 
linearity of the Michaelis-Menten-like form of the pro- 
duction propensity for R. 

Our aim is to compute the differential gain, 
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d\n{R), 



(27) 



which describes the local steepness of the signal-response 
relation {{R)ss ^ {S)is)- The gain measures the sensitiv- 
ity of the system's output {R)ss to its input {S)ss', this 
can be computed by measuring the sensitivities of {R)ss 
and {S)ss to the production and degradation rates of the 
signal molecule. Let us suppose that the signal {S)ss is 
varied by changing its production rate ks infinitesimally 
at fixed degradation rate kd (we could have chosen in- 
stead to vary kd or, in principle, both ks and kd)- The 
gain is then 
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d\n{R),Jdks 



ks d{R)s 



dln{S)Jdks {R),s dks 



(28) 



where the second equality follows from the fact that 
d\n{S)ss/dks = l/ks, since {S)ss — ks/kd- We use the 
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FIG. 2: (color online) Steady state absolute diflferential gain 
for the stochastic focusing model in Section [IV B| where fe^is 
varied to control {S)ss, with other parameters as in Eq. (26 1. 



Open circles are Gillespie simulations using the time-average 
correlation function method with time step pre-averaging; er- 
ror bars are estimated by averaging over 10 trajectories of 
length 10® steps. The history array length was n — 2 x 10^. 
The filled circle (blue) is from a Gibson-Bruck simulation 
in COPASI using the ensemble-average correlation function 
method; error bars are from block averaging (10 blocks of 
(25) 10'' samples). The thick solid line (red) is the numerical re- 
sult from the finite state projection (FSP) algorithm. The 
dashed line is the mean-field theory (MFT) prediction. 



methods described in Section [Till to compute the steady- 
state sensitivity d{R)ss/dks, and hence the gain g. 

Figure [2] shows the absolute differential gain \g\ 
computed using the time-averaged correlation function 
method, with time-step pre-averaging, as a function of 
the signal strength {S)ss, as ks is varied (note that in 
this region the actual gain is negative so \g\ = — g). 
The results are in excellent agreement with the finite 
state projection method (FSP, see Supplementary Ma- 
terial Section |4]) . Fig. [2] (dashed line) also shows the 
mean-field theory prediction derived from the second of 
Eqs. (251, namely g — {S)ss/i{S)ss + K). Stochastic 
focusing, as predicted by Paulsson et al 18 , is clearly 
evident: the gain is much greater in magnitude for the 
stochastic model than the mean-field theory predicts, im- 
plying that fluctuations greatly increase the sensitivity of 
the output signal to the input signal [TO] . 

In this example, the parameter of interest [ks) is the 
rate constant of a single reaction (production of S). As 
discussed in Section [Ilj this implies that the parameter 
sensitivity can be computed simply by counting the num- 
ber of times this reaction is visited, which can be achieved 
by modifying the reaction scheme to 



^S-f-Q, S^0, 



fe* „ 1 
R ^ 



then computing the weight function 
1 



[Q{t) - kst] 



(29) 



(30) 
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FIG. 3: (color online) Representative time traces of U and 
V in the Gardner et al. genetic switch model (Section IV C I. 
Parameters are qi = 50, 



2.5, a2 = 16, and 7 = 1. 



(which is the analogue of Eq. (Ill), and using this to 
obtain the relevant time-correlation functions. This re- 
quires no changes to the simulation algorithm, making 
it easy to use with existing software packages. As a 
demonstration, we computed the differential gain for the 
parameters in Eq. ( |26[ ), using the open source simula- 
tion package copasi[14]. To achieve this, we used the 
Gibson-Bruck algorithm (as implemented in COPASi) to 
generate samples of Wk, and R at equi-spaced time points 
with a spacing At = 10 time units (chosen to be longer 
than the expected relaxation time of the output signal, 
set by the decay constant for R). By taking the differ- 
ence between successive time points we compute AVF^^ 
and hence the correlation function defined in Eq. (16 1. 



The result, shown in blue in Fig. [2j is in good agreement 
with our other calculations. 



C. A bistable genetic switch 

As a final example, we consider a model for a bistable 
genetic switch, of the type constructed experimentally by 
Gardner et al. |20j . in which two proteins U and V mu- 
tually repress each other's production. We suppose that 
transcription factor binding to the operator is coopera- 
tive, and can be described by a Hill function; the rate 
of production of protein U is then given by ai/(l -I- V^) 
while the rate of production of V is given by 02/(1 -I- ?7^) 
(here ai and a2 describe the maximal production rates 
while /? and 7 are the Hill exponents, describing the de- 
gree of cooperativity) . The units of time are fixed by set- 
ting the degradation rates of U and V to unity. For a suit- 
able choice of parameter values, stochastic simulations of 
this model show switching between a U-rich state and 
a V-rich state, as illustrated in Fig. [3) the steady-state 
probability distribution Pq for the quantity q = U — V 
is bimodal, as shown in Fig. [3] We use this example to 
illustrate the computation of parameter sensitivities for 
more complicated scenarios where the system property 



FIG. 4: (color online) Steady state probability distribution 
for the order parameter q = U — V in the Gardner et al. 
switch, and the sensitivity to /3. Points (with small error bars 
in the case of the sensitivity) are from Gillespie simulations 
using the time-average correlation function method with time 
step pre-averaging; error bars are estimated by averaging over 
10 trajectories of 10^ steps. The history array length was 
n = 5 X lO''. The solid lines (red) are the results of FSP 
applied to this problem. Model parameters are as in Fig. [3] 



of interest is not simply a mean copy number and the 
parameter of interest is not a simple rate constant. In 
particular we compute the sensitivity of the steady-state 
probability distribution Pq to the Hill exponent /?. 
Our model consists of the following reaction scheme: 



1 TT 2 



3 Tr 4 

^ V ^ 



(31) 



in which proteins U and V are created and destroyed with 
propensities given by: 



ai = 



ai 



1 + 1//3' 



02 = C^, as = 



a2 



ai = V (32) 



Let us first suppose we wish to compute dPq/dl3 using 
the time-averaged correlation function method, without 
time step pre-averaging. We use the propensity func- 
tions in Eqs. (32 1 to run a standard kinetic Monte Carlo 



(Gillespie) simulation, choosing at each step a next reac- 
tion and a time step St. At each simulation step, we also 
compute the quantity 



91nai yf^lnV 



dp 



(33) 



and update the weight function W according to Eq. (10 1, 
i. e. if reaction 1 is chosen as the next reaction, we incre- 
ment W by z(l — aiSt), otherwise, we increment W by 
—zaidt (note that it is correctly ai that features in this, 
irrespective of the chosen next reaction). We keep track 
not only of the current value of W, but also of its value 
a fixed number n steps ago. At the same time, we keep 
trac k of the function of interest (denoted / in Sections [ll] 
and III I . Because we are computing the parameter sensi- 
tivity of the distribution Pq, we have a function /g, and a 
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time-correlation function Cq^time av('^), for each value of 
q. At each simulation step, we check the current value 
of q{t). The function fg is unity if g = q(t) and zero 
otherwise (i. e. fq = Sq(^t),q)- For each value of q, we 
then compute the time correlation function Cg^timo &v{n) 
as prescribed in Eq. ( [l6| . As long as nf (where f is the 
global average time step) is longer than the typical re- 
laxation time of the system, Cq^timo avin) should give a 
good estimate for dPq/dp. 

If we are instead using time step pre-averaging, we em- 
ploy a slight modification of the above procedure. At 
each simulation step, we choose a next reaction, but we 
do not choose a time step St. In our update rules for 
W, we replace St by the state-dependent mean timestep 
T where = J2fi=i "^m- "^^^^ keeping track of W 
and fq we also need to compute at each step 



dr 

'dp 



ai 



dr 91nai 



(34) 



This quantity is then used to compute Cg^timo avin) and 
hence dPq/dp using the modified algorithm given in Sup- 
plementary Material Section [3j 

An important technical point here concerns the re- 
laxation time of the system, or the number of steps 
n over which we need to remember the system's his- 
tory in order that the correlation function C{n) gives 
a good estimate of the steady-state parameter sensitiv- 
ity. For the previous examples studied, this timescale 
was given by the slowest decay rate (typically that of the 
protein molecules). The genetic switch, however, shows 
dynamical switching behaviour on a timescale that is 
much longer than the protein decay rate (see for example 
Fig. |3| . We therefore need to choose a value of n such 
that nf is longer than the typical switching time. Ki- 
netic Monte-Carlo simulations (like those in Fig.js]) show 
that for our model, the typical time between switching 
events is approximately 160 time units, while the global 
average time step f « 0.02. The typical number of steps 
per switching event is therefore 160/0.02 = 8 x 10'^. Our 
chosen value of n should be at least this large. In prac- 
tice we find that the correlation functions are fully con- 
verged (to within a reasonable accuracy) by n = 5 x 10^ 
steps (w 6.3 switching events), but not quite converged 
by n = 2 X 10^ steps (« 2.5 switching events). These 
lengthy convergence times mean that much longer sim- 
ulations are needed to obtain good statistical estimates 
for the parameter sensitivity in this model than in the 
previous examples. 

Figure |4] shows the steady state probability distribu- 
tion Pq together with its sensitivity dPq/df3, computed 
using the time-averaged correlation function method with 
time step pre-averaging, for the same parameters as in 
Fig. [3] This method gives results in excellent agreement 
with FSP. Pq has the bimodal shape typical of a stochas- 
tic genetic switch, with a large peak at g ~ — 15 and 
a much broader peak around g « 45, with a minimum 
around g « 5. The sensitivity coefficient dPq/d(3 mea- 
sures how the behaviour of the switch depends on the 



cooperativity /3 of binding of the transcription factor V. 
We see that increasing /3 leads to an increased peak at 
g < 0, and a decreased peak at g > 0, in other words 
the switch spends more time in the V-rich state. Also 
the minimum around g « 5 decreases, suggesting that 
the switching frequency decreases as /3 increases. This is 
confirmed by further study using the ensemble-averaged 
correlation function method of the sensitivity coefficient 
of the switching frequency to changes in /3; the details of 
this will be presented elsewhere. 



V. DISCUSSION 

In this paper, we have shown how trajectory reweight- 
ing can be used to compute parameter sensitivity coeffi- 
cients in stochastic simulations without the need for re- 
peated simulations with perturbed values of the param- 
eters. The methods presented here are simple to imple- 
ment in standard kinetic Monte Carlo (Gillespie) simu- 
lation algorithms and in some cases can be used with- 
out any changes to the simulation code, making them 
compatible with packages such as COPASI [Uj. For com- 
putation of time-dependent sensitivity coefficients, the 
method involves tracking a weight function (which de- 
pends on the derivative of the propensities with respect 
to the parameter of interest) and computing its covari- 
ance with the system property of interest, at the time 
of interest, across multiple simulations. For comput- 
ing time-independent steady-state parameter sensitivi- 
ties, we show that the sensitivity coefficient can be ob- 
tained as the long-time limit of a time correlation func- 
tion, which can be computed either across multiple simu- 
lations (ensemble-averaged correlation function method) , 
or as a time average in a single simulation run (time- 
averaged correlation function method). We further show 
that time step pre-averaging removes the need to choose 
a new time step at each simulation step, significantly 
improving computational efficiency. In either the time- 
dependent or the time-independent case, it is a trivial 
matter to compute multiple sensitivity coefficients (e.g. 
with respect to different parameters) at the same time - 
one simply tracks each of the corresponding weight func- 
tions simultaneously. 

In deterministic models, parameter sensitivity coeffi- 
cients can be computed by simultaneous integration of 
a set of adjunct ODEs, alongside the set of ODEs de- 
scribing the model (see Eq. (2|). We consider the trajec- 
tory reweighting approach described here to be the ex- 
act stochastic analogue of the adjunct ODE method; the 
integration of the adjunct ODEs alongside the original 
ODEs is directly analogous to the procedure of generat- 
ing a trajectory weight alongside the normal trajectory 
in a kinetic Monte-Carlo scheme. Indeed, one can de- 
rive an adjunct chemical master equation by taking the 
derivative of the chemical master equation with respect 
to the parameter of interest; it turns out that the trajec- 
tory reweighting scheme is essentially a stochastic solu- 
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tion method for the adjunct master equation f3T|. 

In Section IV B[ we demonstrated the use of trajec- 
tory reweighting to compute parameter sensitivities, and 
hence the differential gain, for a model of a stochas- 
tic signaling network. We believe that this approach 
has widespread potential application to signaling path- 
ways, because it can be implemented for any existing 
model without any modifications to the underlying ki- 
netic Monte-Carlo simulation code. As long as a stochas- 
tic input signal is generated by a process — )■ S — )■ 0, one 
can use the ghost particle trick to compute the sensitiv- 
ity of any quantity of interest to the input signal (con- 
trolled by varying the rate kg of the signal production re- 
action) by modifying the production step to — > S + Q, 
computing the weight function from Wk^ = Q/kg — t 
(see Eq. (30)) and computing the appropriate correla- 



tion function for its covariance with the system property 
of interest. As proof-of-principle we calculated the dif- 
ferential gain for the model in Section IV B (see Fig. [2]), 
using COPASI [13] to generate the simulation data, and a 
standard spreadsheet package to compute the correlation 
function. 

In Section |IV C[ we used the methodology to com- 
pute the sensitivity of the probability distribution func- 
tion for a bistable genetic switch, to the degree of co- 
operativity (Hill exponent) of binding of one of its tran- 
scription factors. This example demonstrates that tra- 
jectory reweighting is not a panacea for all problems. 
The bistable genetic switch has a long relaxation time, 
which requires the correlation function of the weight to 
the computed over long times, with a corresponding need 
for large sample sizes to obtain good statistical sampling. 
While trajectory reweighting works for this example, pre- 
liminary attempts to compute the parameter dependence 
of the switching rate show that finite differencing may 
be more efficient. In fact Plyasunov and Arkin [5] al- 
ready discuss in which cases it may be more efficient to 
use finite-differencing. Because of their long relaxation 



times, genetic switches are notoriously difficult to study 
in stochastic simulations [22]. A plethora of sophisti- 
cated schemes have been developed to address this prob- 
lem [23^-'55' , some of which could perhaps be extended to 
incorporate trajectory reweighting. 

The present study considers how to compute parame- 
ter sensitivity coefficients — i. e. first derivatives of system 
properties with respect to the parameters. The same ap- 
proach can, however, easily be used to compute higher 
derivatives, such as the Hessian matrix, as discussed in 
Supplementary Material Section [TJ This raises the pos- 
sibility of combining the present methods with gradient- 
based search algorithms, to make a sophisticated param- 
eter estimation algorithm for stochastic modeling. This 
would offer a novel approach to a major class of problems 
in systems biology. 

To summarise, we believe the trajectory reweighting 
schemes presented here are an important and useful ad- 
dition to the stochastic simulation toolbox. Further re- 
search should address in detail their performance with 
respect to existing methods [3HS] and their application 
to challenging models such as those with long relaxation 
times, as well as their potential for use in more sophisti- 
cated parameter search algorithms. 
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SUPPLEMENTARY MATERIAL 



1. DERIVATIVES OF AVERAGES WITH RESPECT TO PARAMETERS 

Here, we present a convenient way to compute the derivatives of average quantities with respect 
to the parameters of the model, that are required to arrive at Eqs. ^ and ([T]) in the main text. We 
also show that this method generalizes easily to higher derivatives. 

Noting that in the perturbed system the parameter ka has been changed to k'^, we use Eq. (jsj) in 
the main text to write the average of the function / in the perturbed system as 

where 

Rik'^,K)^^,=e'''''^'''^^~'''''^'''^. (36) 

The function R{k'^,ka) has the property that dR/dk'^ = W'^, R where Wj^, = d\nP{k'^)/dk'^. We 
then have 

djfY ^ {m,R) _ {fR){Wl,R) 

dK {R) {R? ■ ^ ^ 

Taking the limit k'^ — )■ ka (for an infinitesimal perturbation), and noting thereby that i? — )■ 1 and 
Wk^, yields Eqs. ^ and ([7| in the main text. 
Taking this procedure further allows the computation of higher derivatives; one can show for 
instance that the Hessian is 



where 



dk dk \J l^a'^Kp/ I \J '^KaKpl \J ' ^ Ka I \' ^ K p I \J ' ^ K p I X'" ka I (38) 

-{f){WkaWk,)-{f)mak,)+2{f){Wka){Wk,). 
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Eq. (38) is potentially useful for gradient search algorithms. This expression is likely to simplify in 
many cases - for instance we expect that ff^hiP / dkadkp often vanishes for kj^. One might also 
use the fact that {WkJ) = (W^fc^) = 0, but it may improve the statistical sampling to retain these 
terms (see discussion in main text). 

2. EXACT RESULTS FOR PROBLEMS WITH LINEAR PROPENSITIES 

In this Section we describe some exact results that can be obtained for models with linear propen- 



sity functions, in particular for the correlation functions defined in Eqs. (12) and (13) in the main 
text. The analysis draws heavily on established literature results (which we summarize below). More 
details and links to earlier literature can be found in the appendix to Supplementary Ref. [261 

To fix notation, let us suppose that the a-th propensity function depends linearly on the copy 
numbers iVj, namely = ^ajNj + ba where Kaj and ba are constants which we assume to be 
proportional to the rate consant ka- Our aim is to compute the sensitivity coefficients d{Ni) /din k^- 

It is well known that for linear propensity functions the moment equations close successively. Thus, 
the mean copy numbers (Ni) obey 

^ = = E.K.ANj) + (40) 

where z/j^ is the stoichiometry matrix (describing the change in the copy number of the i-th species 
due to the firing of the a-th reaction), = J2a ^iaKaj and bi = i^iaba- Note that Kij is usually 
asymmetric. For the second moments, the variance-covariance matrix Sij{t) = {ANi{t) ANj{t)), 
where AA^i = Ni - {N^), obeys 



dt 

where 



^kiKikSjk + KjkSik) + Hij (41) 



Hij = Y.o,^iaVja{aa)- (42) 

Note that Sij{t) is symmetric. Finally the time-ordered two-point correlation functions 
{ANi{t) ANj{t')) with t > t' obey a regression theorem 

^'^^'"i^^'"'» ^S.^.(AiV.(.)AA.,(.0). (43) 

This concludes our survey of the established literature results. 
We now employ the ghost particle trick of Section III] in the main text, and suppose that reaction 



a now creates a noninteracting species Qa, in addition to its usual products. Following Eq. (40), 
the mean copy number (Qa) obeys 

^ = (aa). (44) 



For the second moments, Eq. (41) becomes 



f)Q. 

= Ej {K,,Sj^ + K^,S,,) + H,^. (45) 

where Sia = {ANi AQ^) ■ The second term in this can be rewritten as 

EjKajS,, = (AiV,(E,^a,AiV,)) = (AiV,Aa,). (46) 
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FIG. 5: (color online) Growth of variance of weight functions in steady state, normalised by the expected growth rate. Model 
and parameters as for Fig. [l] in the main text. 



The stoichiometry matrix entry for Qa consists of a '+1' for the a-th reaction, and zero elsewhere, 

(47) 



so that Eq. (42) becomes = i'ia{0'a)- Finally we have 



dt 

By similar argumentation we also obtain 



dt 



(45 



where 



Armed with these results, let us now turn to the problem of computing the weight function Wk^- 



We write the continuous-time analogue of Eq. (11) in the main text: 

-t 



kaWk^t) = Q^{t) - / dt'ao^it') 



(49) 







(note that by substituting Eq. (44) into Eq. (49 ) we recover our previous observation that (Wk^) = 0). 
Again writing W^in/ca = kaWk^, it follows from Eq. (49) that 



{NmnkJ = {AN,AWi,,k^) = Siait) - [ dt'{AN,{t)Aa^{t')). 

Jo 

Thus, now noting explicitly the time-dependence of the various terms, we obtain 
d{N,WinkJ dS, 



-{AN,{t) Aa„(t)) 







'^^, d{AN,it) Aa^jt')) 
dt 



dt dt 

Exploiting the linearity of the propensity functions, the regression theorem implies 

d{ANi{t) Aa^{t')) 



dt 



Y.,K,,{AN,{t)Aa^{t')). 



Hence 



d{Nmnk:) dSi, 



dt 



dt 



{AN,{t) Aa^it)) - J2 Ki, / dt' {AN,{t) Aa„(t')) 



(50) 



(51) 



(52) 



(53) 
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Eliminating the time integrals between this and Eq. ( 50 ) , we get 



dt 



dt 



(54) 



Eliminating dSia/dt between this and Eq. (47) gives finally 

d{N,W,nkJ 



dt 



^ia (fla) ■ 



(55) 



This is an ODE which give the evolution of (A^jH^irifc^,) in terms of known quantities. It can be 



compared with the adjunct ODE that is obtained by differentiating Eq. (40) with respect to In/c, 



The two ODEs are identical and share the same initial conditions. For this specific case, this is a 
direct proof of the general result in the main text, namely that {NiW\n_ka) = d{Ni) /dlnka- Whilst 
this is interesting, it is not quite what we are after, which is a theory for the correlation functions 
defined in the main text. To find this we first note a generalisation of the regression theorem is 



d{N,{t)Wy^kAto)) 
dt 



J:kK^J{N,{t)W,^kAto)). 



(56) 



Subtracting this from Eq. (|55j) generates a set of coupled ODEs for the correlation functions 
Cj(t,to) = {Ni(t)AWinko,(t^'to)), which should be solved with the initial conditions Ciit.to) = 
at t = to- In steady state these ODEs are 



dQjAt) 
d{At) 



Y.,.K,fi,{At) + 



(57) 



The initial conditions are Ci(0) = 0. This is the key result of this Section, as in principle it allows 
for explicit calculation of the correlation functions. Comparing to the adjunct ODE obtained by 



differentiating Eq. (40) with respect to Infe^, we see that for this case we also have a direct proof 
of the general result claimed in the main text, that Ci{At) — t- d{Ni)/ dlnka as At — > oo. For the 



the results given in Eq. (22) in the main text 



constitutive gene expression model in Section IVA in the main text, we solved Eq. (57) to obtain 



To complete the general discussion here, let us derive an expression for the variance of AWir^ka 



From the definition in Eq. ( 49 ) we have 



Thus 



In ka 



{W,^k. 



) dSa 



[ dt' {AQa{t) Aa^it')) + [ dt' [ dt" (Aa„(t') Aa„(t")) 
Jo Jo Jo 



{51 



dt 



dt 



2(AQ„Aa„) 



2 1'dt' 9{^Q-it)^aait')) ^ 2 fdt' (Aa„(t) Aa^it')). (59) 
'o Jo 



The last two terms in this cancel, on account of the regression theorem. Further cancella- 



tions occur when Eq. (48) for dSaa/dt is inserted, giving finally d{Wi^i^^) /dt = Since 
Wink^it) = Wink^{to) + AWink^, and Wink^{to) and AWink^ are uncorrelated, it follows that 
{W,ikAt)) = (W^in\Jto)) + {AW,l,J. Integrating d{W,U/dt = (a,) and inserting in this last 
expression gives (AVFj^^^) = J^dt' (aa). As a particular case, in steady state, (AVFj^^^) = {aa) At. 
Thus we do indeed see that in steady state AWink^ has a zero mean and a variance that grows 
linearly in time, justifying our claim that it behaves essentially like a random walk. Some results 
confirming this analysis are shown in Fig. [5j 
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3. TIME STEP PRE- AVERAGING 



In the time step pre-averaging approach, we do not select time steps as part of our kinetic Monte 
Carlo algorithm, but instead use the state-dependent average time step r = 



expression for the time average of quantity /, as in Eq. (18) in the main text. For the purposes of 
this Section, we define a new notation: 



m our 



E(/r) 



y fr 

A-/ steps J 



(60) 



steps 



in which the sum is over the values of the system function / multiplied by the (state-dependent) 
mean time step, computed at each step along a kinetic Monte-Carlo trajectory of length iVgteps- Note 
that E(/r) can be computed using an algorithm that does not keep track of time but only of the 



choice of reaction channel. We can then rewrite Eq. ( 18 ) in the main text as 



/ 



Esteps ^ E(/r) 
EsteDs ^ E(r) 



(61) 



When using time step pre-averaging, the correlation function Eq. (16) in the main text must be 
modified because the relative probability of generating a given sequence of states (Eq. (|4]) in the 
main text) takes a different form when the algorithm does not keep track of time, and because the 
average time step r in Eq. (61) itself usually depends on the parameter in question. 



In a kinetic Monte Carlo scheme in which the next reaction is selected as normal, but time is not 
tracked, the probability of generating a given trajectory is proportional to 

^ = risteps K/EuAm)- 



(62) 

(i.e. the part of Eq. (|4]) in the main text concerning the time step distribution is discarded). In 
analogy to Eq. ([s]) in the main text, one can write the average E'(/r) for the perturbed problem in 
terms of averages over unperturbed trajectories: 



E'(/r) 



Hfr'P'/P) 
E{P'/P) ■ 



(63) 



Taking derivatives of Eq. (63) with respect to the parameter ka as described in Section [T] above, it 
follows that 



^^^^^^ - E(/|^) + EifrW^J - E(/r) E(H^,„ 



dka 



dkn 



(64) 



where the fact that r depends on ka leads to an extra term (the first term) in comparison to Eq. (|6]) 
in the main text. Computing W^^ = din. P /dka using Eq. (62), it turns out that Wk^ = Esteps ^^^q 
has the same form as before, but with St replaced by r. 



6Wk 



dkn 



dkr, 



-T. 



(65) 



Since the weight function in Eq. (65) behaves like a random walk, steady-state parameter sensitivities 



should be computed using the correlation function trick (as in Section III in the main text). From 
Eq. (64) we have: 



(66) 
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where 

C{n) = EifrAWkM) - mr)E{AWkM)) (67) 



with AWk^{n) given by Eq. ( |I7| in the main text, using the present Eq. ( |65| ) to generate Wk^. 

(68) 



Finally, the parameter sensitivity coefficient itself is given by differentiating Eq. (61) 

dj 1 9E(/r) E(/r)9E(r) 



dka E(r) OK E(r)2 dk^ 



The quantity (9E(r)/9/cQ in the second term is given by setting / = 1 in Eqs. (66) and (67). When 



using time step pre-averaging in combination with the time-averaged correlation function method. 



one computes the parameter sensitivity coefficient using Eq. (68) rather than simply taking the limit 
of the correlation function as n — )■ oo. 



We note that while Eq. (68) looks formidable, its actual computation is fairly straightforward. 
To obtain both / and df /dka, one computes trajectory averages of the set of quantities defined by 
{1, /} ® {1, r} {1, AWk^{n)} , together with dr/dka and fdr/dk^- These averages are calculated 
by summing the respective quantities along the trajectory and dividing by the number of steps. 

4. THE FINITE STATE PROJECTION ALGORITHM 

The master equation describes the evolution of the probability P{Ni] t) that a system is in the 
state Ni at time t. For the sake of compactness we will adopt the notation s and s' for the states Ni 
and NI respectively. The master equation is [1] 

dPs 

-Q^ = Zls'K's-Ps' - Wss'Ps] (69) 

where Wss' is the transition rate from s to s', given by 

_ J a^{Ni) /i-th reaction is Ni — )■ N-, , . 

Wss' ~ I otherwise. ^ ^ 

The finite state projection (FSP) algorithm is a numerical solution scheme for the master equation 
based on the idea of truncating the state space. For full details of the original FSP algorithm we 
refer to the work of Munksy and Khammash [15]. Here we outline the basic principles of the scheme 
and the small changes needed to adapt it to the computation of steady-state sensitivity coefficients. 



The starting point is to note that Eq. (69) is a linear ODE for P, and may be written in the matrix 
form 

dP 

where A is an infinite-dimensional sparse matrix. To make this into a tractable numerical propo- 
sition, the FSP algorithm truncates the state space to a finite size D. The truncation is chosen so 
as to contain almost all of the probability P{Ni) under the conditions of interest. For the prob- 
lems encountered here, a (hyper-)rectangular truncation scheme works, N^ < Ni < N^, for which 
D = ANi where ANi = N^ — Ni + 1. The question then is how to handle the states not included 
in the truncation scheme. In the original FSP algorithm the extra states are lumped together into 
a single meta-state. All the transitions leaving the truncated state space are connected to this new 
meta-state, and all the transitions entering the truncated state space are discarded. With this ap- 
proximation A becomes a. {D + 1) x [D + 1) sparse matrix, and one can use standard numerical 
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methods to exponentiate the matrix and advance the probabihty distribution, i. e. P{t) = e"^* "PlO). 
The advantage of introducing the meta-state is that Munsky and Khammash can prove some so- 
phisticated truncation theorems which provide a certificate of accuracy for the scheme. 

For the present problem we are interested in the steady state probabihty distribution P^^. However 
the meta-state is an absorbing state, which frustrates the direct computation of P'^'^. To avoid this, 
we discard all transitions which leave or enter the truncated state space whilst, obviously, retaining 
all the transitions contained entirely within the truncated state space. The meta-state is then no 
longer needed and A becomes a D x D sparse matrix. The steady state distribution is found by 
solving A ■ P'^^ = 0, in other words P*^*^ is the right-eigenvector of A belonging to eigenvalue zero. 
That such an eigenvector exists is a textbook argument [1]: conservation of probability, J2s = 1, 
implies Y^gdPg/dt = and hence 1 ■ A = where 1 is a row- vector with entries all equal to 
unity. Since therefore 1 is a /e/it-eigenvector of A with eigenvalue zero, it follows under mild and 
non-restrictive conditions that there is a corresponding ng'/it-eigenvector of A with the same 
eigenvalue. This is the desired steady state probability distribution. 

Well-established numerical methods exist to obtain the eigenvectors of a sparse matrix. For the 
present problems we have used the functionality provided in MathWorks matlab. For an open- 
source solution, we have also had good success with the octave interface to arpack which im- 
plements an implicitly restarted Arnoldi method [27]. From a practical point of view, we find we 
are limited to truncated state spaces of size D < 25 000 for matlab, and somewhat smaller for the 
OCTAVE interface to arpack. This effectively limits consideration to problems involving at most 
two state variables iVj [i = 1, 2) and motivates the choice of examples in the main text. 

Once P'^'^ is found we can calculate (/) = J2s fsf's'^- Sensitivity coefficients like d{f) /dk^ are then 
found by solving the FSP problem at ka and ka + h and using Eq. ^ in the main text, with h 
typically being a few percent of ka- Note that although the master equation describes a stochastic 
process, it is itself a deterministic ODE. Hence this method of computating sensitivity coefficients 
by finite differencing is appropriate. In the absence of truncation theorems, convergence is verified 
empirically. 



